Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

RBCN: Rectiﬁed Binary Convolutional Networks with Generative Adversarial Learning 61

TABLE 3.2

With diﬀerent λ, the accuracy of PCNN-22

and PCNN-40 based on WRN-22 and

WRN-40, respectively, on CIFAR10 dataset.

Model

1e −3

1e −4

1e −5

PCNN-22

91.92

92.79

92.24

91.52

PCNN-40

92.85

93.78

93.65

92.84

Despite the progress made in 1-bit quantization and network pruning, few works have

combined the two in a uniﬁed framework to reinforce each other. It is necessary to introduce

pruning techniques into 1-bit CNNs since not all ﬁlters and kernels are equally important

or worth quantizing in the same way. One potential solution is to prune the network and

perform a 1-bit quantization over the remaining weights to produce a more compressed

network. However, this solution does not consider the diﬀerence between binarized and full

precision parameters during pruning. Therefore, a promising alternative is to prune the

quantized network. However, designing a uniﬁed framework to combine quantization and

pruning is still an open question.

To address these issues, we introduce a rectiﬁed binary convolutional network

(RBCN) [148] to train a BNN, in which a novel learning architecture is presented in a

GAN framework. Our motivation is based on the fact that GANs can match two data

distributions (the full-precision and 1-bit networks). This can also be viewed as distill-

ing/exploiting the full precision model to beneﬁt its 1-bit counterpart. For training RBCN,

the primary process for binarization is illustrated in Fig. 6.10, where the full-precision model

and the 1-bit model (generator) provide “real” and “fake” feature maps to the discrimina-

FIGURE 3.18

This ﬁgure shows the framework for integrating the Rectiﬁed Binary Convolutional Network

(RBCN) with Generative Adversarial Network (GAN) learning. The full precision model

provides “real” feature maps, while the 1-bit model (as a generator) provides “fake” feature

maps to discriminators trying to distinguish “real” from “fake.” Meanwhile, the generator

tries to make the discriminators work improperly. When this process is repeated, both

the full-precision feature maps and kernels (across all convolutional layers) are suﬃciently

employed to enhance the capacity of the 1-bit model. Note that (1) the full precision model

is used only in learning but not in inference; (2) after training, the full precision learned

ﬁlters W are discarded, and only the binarized ﬁlters ^ˆW and the shared learnable matrices

C are kept in RBCN for the calculation of the feature maps in inference.